Administrative rulemaking is often conceptualized as a straightforward translation of legislative delegations into agency rules, where bureaucrats adopt their most preferred policy given their constraints, often speeding up or slowing down the policy process to do so. The rulemaking process is indeed highly formulaic, requiring publication of a draft, external review, and public comment before a final rule is published. However, much of the politics of rulemaking may play out in the less formal and transparent process before a draft policy is published. In this stage, different priorities and constraints may drive decisions about which draft policies are published and when.
Theories of policymaking suggest that the politics of rulemaking may differ significantly in the pre-proposal stage. Yet, this part of the policy process has largely eluded quantitative study due to a lack of systematic data. This paper begins to fill this gap. We introduce a new dataset based on thousands of monthly internal project status reports by the U.S. Department of Transportation that allow us to track a rulemaking effort from its origin date through its finalization. Additionally, we develop a number of novel measures of prioritization and constraints that we are able to measure before and after a draft rule is made public, including the degree of agency resource constraints, internal agency prioritization, changes in sub-agency political appointees, congressional and presidential attention, and the threat of litigation and sanction by courts.
Using these new data and measures, we model the drivers of prioritization and constraint as a multi-stage policy process and identify different political forces at work in each stage. Specifically, we use a multi-stage hazard model to test factors that may affect the timing of rules being proposed or finalized thus identify predictors of prioritization and delay. Previous research suggests that that greater internal agency resources—in terms of personnel and budgets—will speed up rulemaking during the largely opaque pre-proposal stage, while attention from external actors, including Congress and the president, will speed up rulemaking during the more transparent notice and comment period. Our data and measures allow new tests of these theories. We show that resources, complexity, and salience have different effects on the speed of rule-writing across different stages.
Our findings hold important implications for our understanding of transparency and bureaucratic policymaking within a representative democracy.
Find a method to identify the year of each rule’s authorizing statute
Add OIRA report dates
Infer missing withdrawal dates from the date last observed in stage withdrawal
Add subagency administrators and secretary data
OPM staff resources data for years other than 2016
Properly code observations for previous and future stages
Add covariates
load(here("data/DOT-monthly.Rdata"))
dotMonthly %<>% filter(!is.na(initiated)) # select rule-month observations with a dot report
load(here("data/DOT-perRule-perStage.Rdata"))
dotStage %<>% filter(!is.na(initiated)) # select rule-stage observations with a dot report
load(here("data/DOT-perRule.Rdata"))
dotRIN %<>% filter(!is.na(initiated)) # select rule observations with a dot report
dotMonthly %<>% group_by(RIN) %>%
mutate(delay = paste(na.omit(color), collapse = ",")) %>%
mutate(delayedever = grepl("Red|Yellow", delay))
Google spreadsheet of variables
We have initiation dates for 428 unique RINs. Of these, 134 were marked ‘delayed,’8 were marked ’potentially delayed,’ and 75 were marked ‘on time’ at the time they were last observed. However, DOT increasingly marks projects as ‘not on a schedule’ (134 RINs were last observed in the “Black” category). 199 projects were delayed at some point and 66 of these give a reason for delay.
317 RINs were initiated after 2008. 267 have Final Rules Published, 31 have been withdrawn, and 6 were terminated.
This table shows the average number of months after initiation at which certian milestones are reached:
| Average months until: | to Secretary | to ORIA | Published | N |
|---|---|---|---|---|
| ANPRM | 6 | 14 | 17 | n = 62 |
| NPRM | 20 | 31 | 38 | n = 348 |
| SNPRM | 91 | 87 | 102 | n = 17 |
| Interim Final Rule | 5 | 17 | 19 | n = 25 |
| Final Rule | 50 | 55 | 62 | n = 267 |
| Withdrawal | 34 | 87 | 54 | n = 31 |
| Termination | n = 6 |
Adding DOT Internal Status Reports to OIRA and Unified Agenda Reports gives a richer picture of rulemaking. Data by RIN, date, and source:
# If project is ongoing, define enddate as date last observed (for plotting active project status)
dotMonthly %<>% mutate(end = if_else( is.na(enddate), reportdate, enddate) )
# Consolidate like stages for plotting:
dotMonthly$STAGE %<>% {gsub("Terminat.*", "Withdrawal",.)}
dotMonthly$STAGE %<>% {gsub(".2$", "", .)}
dotMonthly$STAGE %<>% {gsub(".3$", "", .)}
dotMonthly$STAGE %<>% {gsub(".4$", "", .)}
dotMonthly$STAGE %<>% {gsub("Notice", "Proposed Rule", .)}
dotMonthly$STAGE %<>% {gsub("Other", "Undetermined", .)}
dotMonthly$STAGE %<>% {gsub("Prerule", "Proposed Rule", .)}
dotMonthly$STAGE %<>% {gsub("Interim Final Rule", "Final Rule", .)}
dotMonthly$STAGE %<>% {gsub("SNPRM", "Proposed Rule", .)}
dotMonthly %>% # filter(reportdate > as.Date("2008-01-01")) %>%
ggplot(aes(x = RIN)) +
geom_linerange(aes(ymin=initiated, ymax=end, color="DOT Reports"), size=.1) +
geom_point(aes(y = DOTdate, color = "DOT Reports", shape = STAGE), size = .5) +
geom_point(aes(y = ANPRMpublished, color = "DOT Reports"), shape = 2, size = .5) +
geom_point(aes(y = NPRMpublished, color = "DOT Reports"), shape = 2, size = .5) +
geom_point(aes(y = SNPRMpublished, color = "DOT Reports"), shape = 2, size = .5) +
geom_point(aes(y = FinalRulePublished, color = "DOT Reports"), shape = 1, size = .5) +
geom_point(aes(y=IFRpublished, color = "DOT Reports"), shape = 1, size = .5) +
geom_point(aes(y=WithdrawalPublished), color="red", shape = 4, size = 1) +
geom_point(aes(y=Terminated), color="red4", shape = 4, size = 1) +
geom_point(aes(y = DATE_RECEIVED, color = "OIRA Reports", shape = STAGE), size = .5) +
geom_point(aes(y = UnifiedAgendaDate, color = "Unified Agenda Reports", shape = STAGE), size = .5) +
facet_grid(acronym~., scales = "free_y", space = "free_y")+
coord_flip() +
scale_y_date(lim = c(as.Date("2000-01-01"), as.Date("2019-01-01")),
breaks=date_breaks(width = "1 year"),
labels = date_format("%y"),
minor_breaks=NULL) +
ggtitle('Significant DOT Rulemaking Projects Active Between 2008 and 2017') +
labs( x = "RIN", y = "Date", color = "", shape = "")+
theme(axis.text.y = element_text(size = 3,
angle = 0,
hjust = 1,
vjust = 1),
axis.ticks = element_blank(),
strip.text.y = element_text(angle = 0),
strip.background = element_blank(),
legend.position = "top")
Notice how the DOT (esp. FTA and FHA) initiated a large number of projects shortly after Obama’s reelection, probably waiting for assurance that they would be able to complete their work.
# stage histogram
ggplot(dotMonthly %>% filter(STAGE %in% c("Withdrawal","Proposed Rule", "Other", "Final Rule", "Interim Final Rule"))) +
stat_count(aes(DOTdate)) +
facet_grid(STAGE~., scales = "free_y") +
labs(title = "Active per Month by Stage", x = "")
#scale_y_date(breaks=date_breaks(width = "1 year"), labels = date_format("%y"), minor_breaks=NULL)
# Consolidate like stages for plotting:
dotRIN$STAGE %<>% {gsub("Terminat.*", "Withdrawal",.)}
dotRIN$STAGE %<>% {gsub(".2$", "", .)}
dotRIN$STAGE %<>% {gsub(".3$", "", .)}
dotRIN$STAGE %<>% {gsub(".4$", "", .)}
dotRIN$STAGE %<>% {gsub("Notice", "Proposed Rule", .)}
dotRIN$STAGE %<>% {gsub("Other", "Undetermined", .)}
dotRIN$STAGE %<>% {gsub("Prerule", "Proposed Rule", .)}
dotRIN$STAGE %<>% {gsub("Interim Final Rule", "Final Rule", .)}
dotRIN$STAGE %<>% {gsub("SNPRM", "Proposed Rule", .)}
dotRIN %>% filter(STAGE %in% c("Final Rule", "Withdrawal")) %>% mutate(end = if_else( is.na(enddate) & DOTdate != "2018-09-01", reportdate, enddate) ) %>% #filter(prompt %in% c("Secretarial/Head of Operating Administration Decision","2011 Retrospective Regulatory Review" , "International Agreement", Settlement Agreement") ) %>%
ggplot() +
geom_point(aes(x = initiated, y = end, color = prompt, shape = STAGE)) +
scale_y_date(limits = c(as.Date("2008-01-01"),as.Date("2019-01-01")),
breaks=date_breaks(width = "1 year"),
labels = date_format("%Y") )+
#geom_smooth(aes(x= if_else(initiated > as.Date("2008-01-01"), initiated, NA), y= enddate), method = "lm" ) +
labs(title = "Completed Rulemaking Projects by Prompting Reason and Outcome", x = "Initiated", y = "Published", color = "Prompt to Initiate Project", shape = "")
We are interested in two kinds of outcomes: (1) what happens? and (2) how long does it take?
Given these questions and the time-series nature of our data (monthly status reports with changing covariate values), we use duration models. Because we are interested in how policies may move through multiple stages of the policymaking process, we use a multi-state model. Doing so allows us to simultaneously model how different factors may affect a policy at different stages (i.e. states), while also modeling a common frailty. We could model each stage separately, but this would ignore correlation within rules across stages.
Two interpretable statistics address each of our main questions.
First, we estimate “transition probabilities” (i.e. the probability that a policy will advance to each possible future state at a given time) using the Aalen-Johansen estimator and the cumulative incidence function (the function estimating the number of observations that have transitioned to and from each state over time).
Second, we estimate the probability that a given transition will occur in a given time window using the Kaplan-Meier estimator. Covariates may increase or decrease the likelihood that a given transition will occur, thus increasing or decreasing the estimated rate of advancing from a given stage to a given stage in the policymaking process, and thus the Kaplan-Meier estimate that a given transition will occur after a given date.
To capture common elements of each rule, we use a penalized likelihood a random effects—a frailty model—with frailties for each rulemaking project (each Regulatory Identification Number).
The paper that is most similar to the analysis we aim to conduct is “Surviving Phases: Introducing Multi-state Survival Models” by Metzger and Jones in Political Analysis. I have not yet found their replication materials to be very helpful, but their appendix has some useful illustrations.
We use mstate, an R package has been developed for multi-state duration modeling. Unfortunately, the vignette does not use tidy data and is not the most helpful. Other R packages that could be relevant are TPmsm which also estimates transition probabilities.
The JSS article introducing mstate and the multi-state and competing risk vignette are helpful.
Rpubs also has a useful review of survival analysis.
There are several ggplot wrappers for plotting survival objects, including ggsurvplot and survminer, and ggfortify allows survfit to be used by autoplot(), but I opt to skip the wrappers and use tidy data.
The tidy() function from the broom package converts survfit objects into tidy data frames (see tidy.survfit documentation). Unfortunately, tidy() does not work for mstate and probtrans objects, so tidying those objects takes more work. Once I better understand the relationship between KM estimates, Cumulative Incidence Functions, and transition probabilities, I may be about to bypass the mstate package. The inconsistency in naming and structure of various survival analysis objects is very annoying.
First, we model the most common stages that rules go through with no covariates, only the dates of transition in and out of the pre-NPRM and NPRM stages. We use the DOT data with one observation per rule per stage.
Formatting data for multi-stage duration modeling requires identifying the dates of transitions to and from each stage. To do this we must define and index a set of transitions of interest. The data are formatted as one observation per rule per transition. The key variable is the days since the rulemaking project was initiated at which that transition occurred.
# PREP DATA ###########################
# Here we take event data with rows indexed by ID and columns for relevant dates and covariates.
# The method takes date variables, melts the data to identify the order in which various transitions occur, and, finally, models transitions in and out of each stage as a multi-state survival model.
# This is a more generic approach than the mstate package vignettes as it does not require one to specify which transitions are possible or actually present in the data. Instead, I derive the transition matrix from the data.
# It also attempts to keep data in a tidy format (rather than special class objects).
# First we identify the order in which events occurred in order to identify different types of transitions.
# STEP 1: IDENTIFY TRANSITIONS AND DATES
# These data start as one observation per ID (in this case, "RIN"). I will be extending this approach to data frames of monthly observations for each RIN
load(here("data/DOT-perRule.Rdata"))
# First, identify an ID variable so that we can split out transitions observed for each ID
d <- original <- rename(dotRIN, id = RIN)
# Select a set of events to model (We will merge back in with covariates later.)
d %<>% select(id, Initiated, NPRMpublished, FinalRulePublished, WithdrawalPublished)
# Melt data to be one observation per event
d %<>%
melt(id = "id", na.rm = T, value.name = "transition_date") %>%
distinct()
# Convert dates to strings for now to make things easier (specifically, the super-useful mutate(ifelse()) method).
d$transition_date %<>% as.character()
# Group by RIN and identify sequence
d %<>%
group_by(id) %>%
arrange(transition_date) %>%
rename(event = variable) %>%
# clean up text for this example
mutate(event = str_replace(event, " *$|published|Published", "")) %>%
mutate(event = str_replace(event, "FinalRule", "Final Rule")) %>%
# make a variable describing the path each took
mutate(transitions = paste(event, collapse = " to "))
# Parse transitions from sequence (how do I make this more general?)
d %<>% separate(transitions, into= c("t1", "t2", "t3", "t4", "t5"), sep=" to ") %>%
#mutate(Initiated = t1) %>%
mutate(t1 = paste(t1, "to", t2)) %>%
mutate(t2 = paste(t2, "to", t3)) %>%
mutate(t3 = paste(t3, "to", t4)) %>%
mutate(t4 = paste(t4, "to", t5))
# melt back down to one observation per id per transition event
d %<>% melt(id = c("id", "event", "transition_date"), na.rm = T, value.name = "transition") %>%
rename(transition_count = variable) %>%
filter( !grepl(" to NA", transition) )
# identify entry and exit dates for each state
d %<>%
group_by(id, transition) %>%
mutate(exit_date = ifelse(event == gsub(".* to ", "", transition),
transition_date,
NA)) %>%
mutate(exit_date = max(na.omit(exit_date))) %>%
mutate(entry_date = ifelse(event == gsub(" to .*", "", transition),
transition_date,
NA)) %>%
mutate(entry_date = max(na.omit(entry_date) ) )
# Now that we have identified transitions and the entry and exit dates, we no longer need the `event` and `transition_date` variables
# However, we will need the event names for the tranition matrix, so we save them
events <- as.character(unique(d$event))
d %<>% select(-event, -transition_date) %>% distinct()
###############################
# STEP 2: MERGE BACK WITH DATA
d %<>% left_join(original) %>% ungroup() %>% arrange(id)
# Count days until event with `difftime()`
d %<>%
mutate(exit = as.numeric(difftime(exit_date, Initiated, units="days") ) )%>%
mutate(entry = as.numeric(difftime(entry_date, Initiated, units="days") ) )
# Days until initiated is obviously 0, but difftime has odd rounding, so let us fix that:
d$entry[grepl("^Init", d$transition)] <- 0
# NOTE THERE ARE SOME ERRORS IN THESE DATA (at least one left) THAT MAY NEED TO BE INVESTIGATED IF THEY PERSIST AFTER PROPER CODING OF TRANSITIONS
d %<>% filter(exit > entry) # FIXME
# Finally, we define the time until the event
d %<>% mutate(time = exit - entry)
##########################################################
# INDEX TRANSITIONS
d$transition %<>% as.factor()
trans <- tlevels <- levels(d$transition)
# Possible transitions from state i
for(i in 1:length(events)){
trans <- gsub(events[i], i, trans)
}
# Make a trasition matrix (indexing transitions to and from each state)
trans <- str_split(trans, " to ", simplify = TRUE)
# Use the matrix to populate a list for the transMat()
from <- list()
for(i in 1:length(events)){
from[[i]] <- as.numeric(trans[which(trans[,1]==i),2])
}
# Transition Matrix
tmat <- transMat(from, names = events)
# ALL OBS ARE TRANSITIONS, HOWEVER, WE CAN INCLUDE NON TRANSITION OBS (i.e. months where no transition happened) -- DOES IT HELP? Perhaps only with covariates that have values for those non-transition months?
d$status <- 1
In estimation, the multi-stage duration model is stratified by transition.
Simple example: fit <- survfit(Surv(entry, exit, status) ~ strata(transition), data = d)
autoplot(fit) will quickly plot km estimate per strata with the ggfortify package.
fit <- survfit(Surv(entry, exit, status) ~ strata(transition), data = d)
autoplot(fit)
Now with tidy data:
fit <- survfit(Surv(entry, exit, status) ~ strata(transition), data = d) %>%
tidy()
fit %<>% # clean up text
mutate(from = factor(paste("from",gsub(".*=| to .*","", strata)) ) ) %>%
mutate(to = gsub(".*=|.* to ","to ", strata) ) %>%
mutate(to = gsub(" *$","", to) )
fit %>% #filter(estimate > 0) %>%
filter(time<3650) %>% # limit to 10 year timeframe
ggplot(aes(x = time/365, estimate)) +
geom_line() +
scale_x_continuous(breaks = seq(0,10, by = 1)) +
geom_point(shape = "+") +
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha=.25) +
facet_grid(to ~ from, scales = "free_x") +
labs(x = "Years" , y = "KM probability transition takes longer than t") +
theme_bw()
Now with tidy data and a covariate:
fit <- survfit(Surv(entry, exit, status) ~ MAJOR + strata(transition), data = d) %>%
tidy()
fit %<>%
separate(strata, c("Major", "strata"), ", ") %>%
#filter(estimate > 0) %>%
filter(time<3650) %>% # limit to 10 year timeframe
mutate(from = factor(paste("from",gsub(".*=| to .*","", strata)) ) ) %>%
mutate(to = gsub(".*=|.* to ","to ", strata) ) %>%
mutate(to = gsub(" *$","", to) )
fit %>%
ggplot(aes(x = time/365, estimate)) +
geom_line(aes(color = Major)) +
scale_x_continuous(breaks = seq(0,10, by = 1)) +
geom_point(aes(color = Major), shape = "+") +
geom_ribbon(aes(fill = Major, ymin = conf.low, ymax = conf.high), alpha=.25) +
facet_grid(to ~ from, scales = "free_x") +
labs(x = "Years" , y = "KM probability transition takes longer than t") +
theme_bw()
Now using the multi-state method:
QUESTION: I am not yet clear on exactly how type = “mstate” is different. Furthermore, these CIFs do not look right. I know there are more rules that go straight to the Final Rule stage than to Withdrawal, but survfit apprears to drop this transition. Why?
# Status is now multinomial by trasition. This will help add non-event time observations (i.e. status !=1)?
fit <- survfit(Surv(time, status * as.numeric(transition), type = "mstate") ~ 1,
data = d) %>% tidy()
# replace numeric transition index with transition names (note: levels with too few observations may be dropped)
fit$state %<>% as.factor()
levels(fit$state) <- levels(d$transition)[as.numeric(levels(fit$state))]
fit$Transition <- fit$state
# make faceting variables for each current and future state
fit %<>%
mutate(to = gsub(".*=|.* to ", "to ", Transition) ) %>%
mutate(from = factor(paste("from", gsub(".*=| to .*","", Transition)) ) )
# plot
fit %>%
ggplot(aes(x = time/365, color = Transition, fill = Transition) ) +
geom_step(aes(y = estimate)) +
geom_ribbon(aes(ymax = estimate + 1.96*std.error,
ymin = estimate - 1.96*std.error),
alpha = .2, color = NA) +
# geom_text(aes(y = max, x = max(time/365), label = paste0(round(max*100), "%")), hjust = 1, vjust = -.2, color = "black") +
labs(x = "Years" , y = "Cumulative incidence of all transitions") +
facet_grid(. ~ from) + # for comparing states without covariates
#facet_grid(to ~ from) + # for comparing covariates within states
theme_bw()
Normalize by state (because we care about the competing risk within each state.)
fit %<>%
group_by(Transition) %>%
mutate(max = max(estimate)) %>%
ungroup() %>%
group_by(from, time) %>%
mutate(estimate = estimate/(sum(max))) %>%
mutate(max = max/sum(max)) %>%
ungroup()
# plot
fit %>% filter(time < 3650) %>%
ggplot(aes(x = time/365, color = Transition, fill = Transition) ) +
geom_step(aes(y = estimate)) +
geom_ribbon(aes(ymax = estimate + 1.96*std.error,
ymin = estimate - 1.96*std.error),
alpha = .2, color = NA) +
geom_text(aes(y = max, x = max(time/365), label = paste0(round(max*100), "%")), hjust = 1, vjust = -.2, color = "black") +
labs(x = "Years" , y = "Cumulative Incidence Function
normalized by stage") +
facet_grid(. ~ from) + # for comparing states without covariates
#facet_grid(to ~ from) + # for comparing covariates within states
theme_bw()
Now the mstate model with a covariate:
fit <- survfit(Surv(time, status * as.numeric(transition), type = "mstate") ~ MAJOR,
data = d) %>% tidy()
# replace numeric transition index with transition names (note: levels with too few observations may be dropped)
fit$state %<>% as.factor()
levels(fit$state) <- levels(d$transition)[as.numeric(levels(fit$state))]
fit$Transition <- fit$state
# make faceting variables for each current and future state
fit %<>%
mutate(to = gsub(".*=|.* to ", "to ", Transition) ) %>%
mutate(from = factor(paste("from", gsub(".*=| to .*","", Transition)) ) )
# plot
fit %>% filter(time < 3650) %>%
ggplot(aes(x = time/365, color = strata, fill = strata) ) +
geom_step(aes(y = estimate)) +
geom_ribbon(aes(ymax = estimate + 1.96*std.error,
ymin = estimate - 1.96*std.error),
alpha = .2, color = NA) +
# geom_text(aes(y = max, x = max(time/365), label = paste0(round(max*100), "%")), hjust = 1, vjust = -.2, color = "black") +
labs(x = "Years" , y = "Cumulative incidence of all transitions") +
#facet_grid(. ~ from) + # for comparing states without covariates
facet_grid(to ~ from) + # for comparing covariates within states
theme_bw()
Normalize by state (because we care about the competing risk within each state.)
fit %<>%
group_by(Transition, strata) %>%
mutate(max = max(estimate)) %>%
ungroup() %>%
group_by(from, time, strata) %>%
mutate(estimate = estimate/(sum(max))) %>%
mutate(max = max/sum(max)) %>%
ungroup()
# plot
fit %>% filter(time < 3650) %>%
ggplot(aes(x = time/365, color = strata, fill = strata) ) +
geom_step(aes(y = estimate)) +
geom_ribbon(aes(ymax = estimate + 1.96*std.error,
ymin = estimate - 1.96*std.error),
alpha = .2, color = NA) +
#geom_text(aes(y = max, x = max(time/365), label = paste0(round(max*100), "%")), hjust = 1, vjust = -.2, color = "black") +
labs(x = "Years" , y = "Cumulative Incidence Function
normalized by stage") +
#facet_grid(. ~ from) + # for comparing states without covariates
facet_grid(to ~ from) + # for comparing covariates within states
theme_bw()
Now we use msfit() to extract transition probabilities and fitted values (hazards).
QUESTION: What is the relationship between transition probabilities and cumulative incidence?
# MODEL WITH NO COVARIATES
d$trans <-d$transition
mod.1 <- coxph(Surv(time, status) ~ strata(trans), data = d, method = "breslow")
# # Extract fitted values
fit <- msfit(mod.1, trans=tmat)
# Extract transition probabilities from each state
# pred = # A positive number indicating the prediction time--the time at which the prediction is made
pt0 <- probtrans(fit, predt = 1)
# tidy the mstate object
trans <- as.data.frame(pt0[[5]]) # transition matrix
p <- pt0[[1]] # estimate matrix
p$from <- paste(1, names(trans)[1]) # names from transition matrix
for(i in 2:length(events)){
pt <- pt0[[i]]
pt$from <- paste(i, names(trans)[i])
p <- rbind(p, pt)
}
# p$pstate2[which(p$from == "1 Initiated")] <- p$pstate2[which(p$from == "1 Initiated")] + p$pstate1[which(p$from == "2 NPRM")] +
# p$pstate3[which(p$from == "2 NPRM")] +
# p$pstate4[which(p$from == "2 NPRM")]
# melt into one observation per transition per time
p %<>% melt(id = c("time", "from"), value.name = "Probability")
# separate state from variable (estimate, error) name
p %<>% separate(variable, into = c("variable", "to"), sep = "e") %>%
# spread estimates and errors into two variables
spread(key = "variable", value = "Probability") %>%
# name estimate and error variables
rename(estimate = pstat, std.error = s)
# name transitions
p$to %<>% as.factor()
p$from %<>% as.factor()
levels(p$to) <- levels(p$from)
p %<>%
mutate(Transition = paste0("Pr(",to,"|",from,")")) %>%
mutate(from = factor(paste("At state", from))) %>%
mutate(to = factor(paste("Pr ", to)))
# plot
p %>% filter(time <3650, !estimate %in% c(0,1) ) %>%
ggplot(aes(x = time/365, color = Transition, fill = Transition) ) +
geom_step(aes(y = estimate)) +
geom_ribbon(aes(ymax = estimate + 1.96*std.error,
ymin = estimate - 1.96*std.error),
alpha = .2, color = NA) +
#geom_text(aes(y = max, x = max(time/365), label = paste0(round(max*100), "%")), hjust = 1, vjust = -.2, color = "black") +
labs(x = "Years" , y = "Transition probability
from Aalen-Johansen estimator") +
#facet_grid(. ~ from) + # for comparing states without covariates
facet_grid(to ~ from) + # for comparing covariates within states
theme_bw()
QUESTION: msfit probtrans() with covariate does not work without newdata argument?
The model is nearly identical to the one above:
mod.1 <- coxph(Surv(time, status) ~ MAJOR + strata(trans), data = d, method = "breslow")
But extracting fitted values requires “newdata” - why? - documentation does not say.
fit <- msfit(mod.1, trans=tmat)
Hazard rates stratified by transition from the mstate package:
# NOTE THIS IS CHUNK IS OLD CODE WRITTEN SPECIFICALLY FOR THIS CASE
# Thus, it has a different variable name than above.
# select vars for minimal model
d <- select(dotStage, DOTdate, RIN, STAGE, initiated, DaysUntilNPRM, DaysUntilFinalRule, DaysUntilWithdrawal) %>%
# select minimal stages
filter(!is.na(initiated), !is.na(DOTdate), STAGE %in% c("Proposed Rule", "Final Rule", "Withdrawal")) %>%
distinct() %>%
arrange(DOTdate) %>% arrange(RIN)
# Days until event
# FIXME
# note, bellow subs last observed date as published date where missing; not always correct
d %<>% mutate(DaysUntilNPRM = ifelse(is.na(DaysUntilNPRM) & STAGE == "Proposed Rule", DOTdate - initiated, DaysUntilNPRM))
d %<>% mutate(DaysUntilFinalRule = ifelse(is.na(DaysUntilFinalRule) & STAGE == "Final Rule", DOTdate - initiated, DaysUntilFinalRule))
d %<>% mutate(DaysUntilWithdrawal = ifelse(is.na(DaysUntilWithdrawal) & STAGE == "Withdrawal", DOTdate - initiated, DaysUntilWithdrawal))
# identify previous stages in the data (need to use previous stage var to supplement this)
d %<>% group_by(RIN) %>%
mutate(stages = paste(STAGE, collapse = ";")) %>% ungroup()
d$trans <- NA
# to proposed rule = trans 1
d$trans[grepl("^Proposed Rule", d$stages) & d$STAGE == "Proposed Rule"] <- 1
# direct to final = trans 2
d$trans[d$stages =="Final Rule"] <- 2
# withdrawal = trans 3
d$trans[d$stages == "Proposed Rule;Withdrawal" & d$STAGE == "Withdrawal"] <- 3
# proposed to final = trans 4
d$trans[grepl("Proposed Rule;Final Rule", d$stages) & d$STAGE == "Final Rule"] <- 4
# Transition Matrix
tmat <- transMat(list(c(2, 4), #transitions from prerule
c(3,4), #transitions from nprm
c(), #transitions from withdrawal
c()), # trans from final
names = c("preNPRM", "NPRM", "Withdrawal", "Final"))
# USE INDEX TO ID ENTRY AND EXIT DATES
# prerule stage: entry = initiated, exit = NPRM published, status = 1 if DOT report date = NPRM published date
# obs entering at date initiated
d %<>% mutate(entry = ifelse(trans %in% c(1,2), 0, NA))
# obs entering at NPRM date
d %<>% mutate(entry = ifelse(trans %in% c(3,4), DaysUntilNPRM, entry))
# obs exiting at Final Rule published date
d %<>% mutate(exit = ifelse(trans %in% c(2,4), DaysUntilFinalRule, NA))
# obs exiting at Withdrawal published date
d %<>% mutate(exit = ifelse(trans %in% c(3), DaysUntilWithdrawal, exit))
# obs exiting at NPRM published date
d %<>% mutate(exit = ifelse(trans %in% c(1), DaysUntilNPRM, exit))
# NOTE THERE ARE SOME ERRORS THAT MAY NEED TO BE INVESTIGATED IF THEY PERSIST AFTER PROPER CODING OF TRANSITIONS
d %<>% filter(exit > entry) # FIXME
# ALL OBS ARE TRANSITIONS, HOWEVER, WE CAN INCLUDE NON TRANSITION OBS (i.e. months where no transition happened) -- DOES IT HELP? Perhaps only with covariates that have values for those non-transition months?
d$status <- 1
# MODEL WITH NO COVARIATES
mod.1 <- coxph(Surv(entry, exit, status) ~ strata(trans), data = d, method = "breslow")
# # INSPECT
# summary(mod.1)
# class(mod.1$xlevels)
# # Extract fitted values
fit <- msfit(mod.1, trans=tmat, variance=TRUE)
# CAUTION NOTE: in one run, Withdrawals got dropped, making 4 (NPRM to Final) the 3rd factor. Not cool.
f <- fit$Haz
# Name stage for trans 1 and 2
f %<>% mutate(Stage = ifelse(trans %in% c(1,2), " Pre NPRM Stage", "Post NPRM Stage") )
# Name transitions
f$trans %<>% {gsub(1, "Inititiated to NPRM", .)}
f$trans %<>% {gsub(2, "Direct to Final", .)}
f$trans %<>% {gsub(3, "NPRM to Withdrawal", .)}
f$trans %<>% {gsub(4, "NPRM to Final", .)}
# Name variables
names(f) <- c("Days", "Hazard", "Transition", "Stage")
# plot
ggplot(f) +
geom_line(aes(x = Days, y = Hazard, color = Transition)) +
facet_grid(. ~ Stage) +
theme_bw()
Google spreadsheet of variables
Some are measured at the rule level, others at the agency level, and others for all rules at a given point in time. Many vary by month or year. We will include a frailty by rule and may need cluster standard errors by rule or agency depending on the covariates.
Many covariates may be correlated and we really should be modeling them as indicators of the latent factors that affect policymaking. For example, observed covariates may be grouped into broader, more discrete concepts like the resources the agency has, the technical complexity of the rule, and the political contentiousness of the rule.
Resources measures:
sub-agency budget? - can we get the budget for rulemaking activities? Is this different than estimating SES?
“lack of resources” or “lack of staffing” claimed [0,1], a subset of delayed rules
number of SES / # of rules per year (ideally at sub-agency level if we have SES data for that)
secretary prompt [0,1] - how many missing?
Non-acting subagency head [0,1]
Same president as initiation date [0,1]
Same secretary as initiation date [0,1]
Same subagency head as initiation date [0,1]
Complexity measurements:
“coordination necessary” claimed [0,1], subset of delayed rules
“unanticipated issues/impacts requiring further analysis”*“awaiting development of additional data” claimed [0,1], a subset of delayed rules
the score of the abstract on Benoit’s political sophistication/complexity
second comment period [1,0] (though this may also be a measure of external, interest group constraint)
the number of comments? (to me this is more about salience and contentiousness than complexity)
MORE MEASURES?
White House and Secretary alignment measures:
“other, higher priorities” claimed [0,1] (is it right to interpret this way? Priority could also be internal…which would make it, what? Not a resource per se)
initiated under same president [0,1] (same party president? we have a few started under Clinton, finished under Obama)
went to Secretary twice [0,1]
days between going to secretary going to ORIA (average if multiple times)
went to ORIA twice [0,1]
Did ORIA make changes [0,1]
days between going to ORIA and publication (count days from the first or last time at ORIA? )
Congress alignment measures:
positive-negative sentiment words in subcommittee budget report
positive-negative sentiment words in the subcommittee hearing
legislative deadline exists [0,1]
legislative deadline past [0,1]
prompting action is legislation [0,1] (speeds up rulemaking?)
legislation is pending [0,1] (slows rulemaking)
Reg flex required [0,1] slowing?
unified congress
Litigation threat:
court deadline exists [0,1]
court deadline past [0,1]
number of lawsuits against the subagency in the past x years
avg number of lawsuits against the agency per year of administration (this is perhaps better suited for a model that is not just DOT)
litigation threatened in comments? [0,1]
Below are examples of positive and negative sentiment about the FAA from the 2018 appropriations subcommittee report. Much it is very neutral, but it is fair to say that this actually reflects neutral attitudes. Sentiment analysis allows us to tally up the positive and negative words for each subagency. Other scholars have measured the length of the section on each subagency, but this mostly just reflects the size and diversity of its budget.
POSITIVE:
The Committee is encouraged by the additional measures the FAA is taking to enhance outreach to communities affected by new flightpaths. The Committee recommendation includes an additional $2,000,000 to support the FAA’s ongoing efforts to address community noise concerns. The Committee supports research that is being conducted through the FAA’s Center of Excellence for Alternative Jet Fuel and Environment, the Aviation Sustainability Center (ASCENT) on the impact of aviation noise on both sleep and cardiovascular health. The Committee commends the FAA Office of Commercial Space Transportation’s efforts to promote private sector lunar exploration and development and encourages the FAA to explicitly define non-interference and to enhance its payload review process to provide companies planning private sector lunar development with the security and predictability necessary to support substantial investments.
NEGATIVE:
The Committee is concerned by the FAA’s decision to extend by five years its aging Mission Support Network, a decision which carries both technology and cost implications.
List of unique rule stages and how we code each (Note /2 means the second such document for the same RIN):
[1] “Final Rule” = Final Rule
[2] “Undetermined” = Undetermined
[3] “NPRM” = Proposed Rule
[4] “SNPRM” = SNPRM (sometimes folded into Proposed Rule) [5] “Interim Final Rule/4” = Interim Final Rule
[6] “Interim Final Rule” = Interim Final Rule
[7] “Withdrawal” = Withdrawal
[8] “Other” = Undetermined
[9] “ANPRM” = Prerule
[10] “Final Rule/2” = Final Rule/2
[11] “NPRM/2”= Proposed Rule
[12] “SNPRM/2”= SNPRM
[13] “Notice” = Notice
[14] “Supplemental Notice of Intent” = Prerule
[15] “Reconsideration of Final Rule” = Final Rule
[16] “Disposition of Comments” = Other
[17] “Request for Comments” = Other
[18] “Final Rule with Request for Comments” = Final Rule [19] “Termination” = Termination (sometimes folded into Withdrawal)